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SUMMARY 


; 

v.._\ 

This  paper  introduces  probabilistic  choice  to  synchronous 
parallel  machine  models;  in  particular  parallel  RAMs.  The  power  of 
probabilistic  choice  in  parallel  computations  is  illustrated  by  O(log  n) 
time  algorithms  for  connectivity  and  recognising  bipartite  graphs  and 
O (log  n)^-  time  algorithms  for  testing  if  a  graph  has  a  perfect  matching, 
testing  in  time  0(n)  irreducibility  of  polynomials 

over  finite  fields.  We  characterize  the  computational  complexity  of  time, 
space,  and  processor  bounded  probabilistic  parallel  rams  in  terms  -of 
the  computational  complexity  of  probabilistic  sequential  RAM6.  We  show 
that  parallelism  uniformly  speeds  up  time  bounded  probabilistic,  sequential 
RAM  computations  by  nearly  a  quadratic  factor.  We  also  show  that 
probabilistic  choice  can  be  eliminated  from  parallel  computations  by 
introducing  nonuniformity 


1. 


INTRODUCTION 


Probabilistic  choice  is  the  use  of  randomly  chosen  moves  in  an  otherwise 
deterministic  computation  given  a  fixed  input.  The  introduction  of  probabi¬ 
listic  choice  in  sequential  computations  leads  to  considerable  improvement  to 
the  computational  complexity  of  various  number  theoretic  problems  [Berlekamp, 
70],  [Rabin,  74],  [Solovay  and  Strassen,  77],  [Adleman,  Manders,  and  Miller, 

75],  [Rabin,  80],  [Zippel,  79]  to  combinatorial  problems  on  graphs  and 
matroids  [Lovasz,  80],  to  testing  polynomial  identities  [Schwartz,  80],  and 
testing  program  equivalence  [Ibarra  and  Moran,  80]. 

Recently,  [Rabin,  80],  [Lehman  and  Rabin,  80] , [Frances  and  Rodeh,  80], 

[keif  and  Spirakis,  81  and  82]  have  utilized  probabilistic  choice  in 
synchronisation  algorithms  for  asynchronous  multiprocesses  systems. 

This  paper  investigates  the  use  of  probabilistic  choice  in  synchronous 

parallel  machines.  We  present  a  pair  of  simulation  results  (Theorems  4.1 

and  4.2)  which  relate  probabilistic  sequential  and  probabilistic  parallel 

computations  on  RAMs.  By  parallel  simulation  of  previously  known  probabilistic 

sequential  algorithms  [Aleliunas,  et  al.,  79],  our  Theorem  4.1  immediately 

yields  as  corollaries  the  fastest  known  parallel  algorithms  for  a  variety 

of  combinatorial  problems  such  as  an  O(log  n)  time  test  if  there  exists  a  path 

between  two  vertices  of  a  undirected  graph  and  an  0(log  n)  time  test  if 

graph  is  bipartite.  Both  these  probabilistic  parallel  algorithms  use 

O(n3log  n)  processors.  Previously  the  fastest  known  parallel  algorithm 

2 

ror  these  problems  required  O(log  n)  [Csanky,  76]. 

2 

We  give  0(log  n)  time  probabilistic  P-RAM  algorithms  for  testing  if 
a  graph  of  n  vertices  has  a  perfect  matching,  end  an  0(n)  time 
test  if  a  polynomial  of  degree  0(n)  has  a 
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root  over  GF(pn).  (Also,  recently  [Reischuk,  81]  has  ahown  that  a 
probabilistic  parallel  RAN  can  sort  in  time  O(log  n)  with  O(n) 
processors . ) 

We  have  an  interesting  theoretical  result  (Theorem  5)  for  speeding 
tip  a  log-cost  (unit-cost,  respectively)  probabilistic  sequential  RAM 
computation  of  time  T(n),  by  simulation  on  a  probabilistic  parallel  RAM 
in  log-cost  time  0(T(n)  ^2log  T(n))  (in  unit-cost  time  0(T(n)(log  T(n)) 
log(T(n)l (n) ) )  ,  respectively,  where  Z(n)  is  the  maximum  integer 

operated  upon  the  simulated  unit-cost  probabilistic  RAM).  Previously, 
[Dymond,  80]  proved  a  quadratic  speedup  of  deterministic  multitape  Turing 
machines;  however  he  considered  the  simulation  of  neither  probabilistic 
machines  nor  RAMs. 

[Adleman,  78]  has  previously  proved  that  probabilistic  choice  can  be 

eliminated  in  sequential  computations  if  there  is  no  error  of  acceptance. 

Theorem  6  of  Section  6  proves  that  probabilistic  choice  can  be  eliminated 

from  probabilistic  parallel  RAMs  with  both  errors  of  acceptance  and  errors  of 

rejection  by  introducing  nonuniformity,  with  some  increase  of  time  and 

processor  bounds  which  may  be  traded  off.  This  implies  there  exists  non- 

uniform  deterministic  parallel  RAMs  which  can  in  unit-cost  time  O(log  n) 

2 

test  if  a  graph  of  n  vertices  is  connected,  and  in  time  O(log  n)  test 
if  a  graph  of  n  vertices  has  a  perfect  matching,  and  in  time  0(n) 
test  if  a  polynomial  of  degree  O(n)  has  a  root  in  GF  (pn) . 
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DEFINITIONS  OF  PROBABILISTIC  MACHINES 


2.1  Abstract  Hitching  Typo 

Before  describing  our  probabilistic  parallel  machines,  it  is  useful  to 
define  probabilistic  (and  also  deterministic  and  nondeterminiatic)  machine 
types  abstractly,  without  reference  to  the  particular  details  of  operation 
of  the  machines. 

Let  M  be  a  fixed  machine.  A  configuration  of  M  is  a  finite  string 
I  over  a  fixed  finite  alphabet  describing  the  current  state  and  storage 
contents  of  M.  Let  &  be  the  set  of  configurations  of  M.  Let 
be  the  set  of  aooepting  con figurations  of  M.  Let  Z  be  the  finite  input 
alphabet  of  M.  Given  an  input  string  w£I>,  let  Ig(w)  £«/  be  the 
corresponding  initial  configuration  of  M.  Let  f-c/x/  be  the  next 
move  relation  for  M>  for  each  ,  NEXT (I)  -  {l'|l  H  I'}  is  the  set  of 

possible  configurations  derived  from  X  by  a  single  move  of  M.  (We  assume 
there  is  no  next  move  from  an  accepting  configuration.)  In  a  nondeteministic 
machine,  any  I'€NEXT(X)  may  be  chosen  nondeterministically.  Xn  a 
probabilistic  machine,  each  I '  €  NEXT (I)  is  chosen  with  equal  probability, 
independently  of  previous  and  succeeding  choices.  Xn  a  deterministic  machine 
N.  | next  (I)  |  ^  1  for  all  X€^. 

Given  a  fixed  input  string  w£Z*,  a  computation  sequence  of  M  is  a 
maximal  length  sequence  of  configurations  X^,!^,...  such  that  W®* 
and  I-  x^  for  i«l,2f...  .  The  computation  sequence  is  accepting  if 

it  is  finite  and  the  last  configuration  is  accepting.  Xn  a  deterministic  or 
nondeterministic  machine,  M  accepts  w  iff  there  exists  an  accepting 
computation  sequence  from  Ig  (oj)  .  Xn  a  probabilistic  machine ,  M  accepts  u> 
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iff  Proto  (COMP  (w)  is  accepting)  >  1/2,  where  COMP(w)  is  a  random 
computation  sequence  from  IQ (u)  (generated  toy  random  next  moves  as  defined 
etoove).  Let  the  language  accepted  by  M  be  L(M)  -  {w£E*|m  accepts  u>). 

2.2  Error  Restricted  Probabilistic  Machines 

Let  M  toe  a  probabilistic  machine  which  accepts  language  L(M).  Let 
the  acceptance  error  eA(n)  and  the  rejection  error  eR(n)  be  the  minimum 
functions  such  that  for  all  n>0,  w€En, 

(i)  if  w*L(M)  then 

Prob{C0MP  (oj)  is  accepting}  <  eA(n) 

(ii)  If  a)  €  L  (M)  then 

Prob{C0MP  (u>)  is  not  accepting)  <  e  (n) . 

Note  that  by  definition  e  (n) < 1/2  and  e_(n)<l/2. 

A  R 

For  deterministic  or  nondetermini Stic  machines  M,  M'  let  M**M'  if 
L (M)  »L(M').  For  two  probabilistic  machines  M,  M',  let  M**M'  have  both 
thti  same  error  of  acceptance  and  the  same  error  of  rejection. 

Let  M  be  a  VP-probabilistio  machine  if  there  exists  a  constant 
e  <  1/2  such  that  for  all  n>0,  e  >max(eA(n)  ,eR(n) ) .  Thus  a  BP-probabilistic 
machine  has  a  constant  upper  bound,  which  is  lass  than  1/2,  on  errors  of 
acceptance  and  rejection. 

Let  M  be  a  R- probabilistic  machine  if  there  exists  a  constant 
e  <  1/2  such  that  for  all  n  >  0,  e>e  (n) ,  and  M  never  has  an  accepting 
computation  on  any  input  string  U)  €  E*  -  L(M) . 
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2.3  Probabilistic  Sequential  Mach  inf 

A  nondatarminiatic  Turing  machina  may  ba  mada  a  probabilistic  Turing 

machine  by  allowing  next  movaa  to  ba  choaan  randomly  with  equal  probability , 

as  dascribad  in  Sac.  2.1.  Saa  [Simon,  75]  for  a  discussion  of  probabilistic 

Turing  machinas  with  unraatrictad  arrors  and  saa  [Adleman,  78]  for  some 

rasults  for  R-probabiliatic  Turing  machinaa.  [Bennett  and  Gill,  81] 

discuss  these  and  various  other  classes  of  probabilistic  Turing  machines. 

•* 

Our  principal  sequential  machina  modal  is  the  probabilistic  Random 
Access  Machina  (RAM) ,  which  is  defined  hare  similarly  to  [Aho,  Hopcroft  and 
Ullman,  74],  except  we  allow  the  RAM  probabilistic  choice.  A  probabilistic 
RAM  consists  of 

(1)  an  infinite  sequence  of  memory  locations  m0,m.,...  each  of  which 
are  indexed  by  and  contain  a  nonnegative  integer 

(2)  a  fixed  sat  of  registers  R  each  of  which  contains  a  nonnegative 
integer 

(3)  a  probabilistic  finite  state  control  which  allows  the  following 
operations i 

(a)  for  any  registers  r^r^CR,  load  (or  road)  the  contents  of  rj 
into  (or  from,  respectively)  the  contents  of  global  memory 
location  m^,  where  i  is  the  current  contents  of  register  i'2. 

(b)  for  any  registers  r^.r^r^CR,  apply  an  addition,  subtraction, 
multiplication,  or  division  operation  on  the  contents  of 
registers  r^,r2  and  load  the  result  into  register  r y 

(Note:  we  round  noninteger  rationale  to  the  next  lower  integer.  Also,  we 
substitute  0  for  the  result  of  a  subtraction  which  is  negative.) 
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A  unit  oo 8 t  HAM  it  charged  1  step  for  each  of  the  above  operations;  a 
log-cost  HAM  is  charged  ^log (x+2) 1  steps  for  each  of  the  above  operations 
which  are  on  integers  of  sice  x. 

We  assume  a  binary  input  alphabet  {0,l}.  Given  an  input  string 

fa>€  {o, 1 } * ,  each  memory  location  m^^  initially  contains  the  i-th  bit  of 

a;  for  l<i<  Iwl,  m  contains  2,  and  all  other  memory  locations  and 

n 

registers  are  initially  0.  The  memory  location  mQ,...,mn  are  read-only, 
and  cannot  be  loaded  into.  Also,  we  assume  the  finite  control  has 
distinguished  initial  and  accepting  states.  A  configuration  is  accepting 
if  the  machine  is  in  the  accepting  state.  The  probabilistic  RAM  accepts 
input  u  if  with  probability  >  1/2  a  random  computation  sequence  is 
accepting.  The  probabilistic  RAM  has  time  bound  T(n)  ( space  bound  S(n), 
integer  bound  I(n))  if  on  all  inputs  of  length  n  and  accepting 
computation  sequences,  the  machine  takes  <  T(n)  steps  (uses  <  S(n)  space, 
operates  on  integers  <  I (n) ,  respectively).  Note  that  we  have  defined  steps 
differently  for  unit- cost  and  log-cost  RAMs.  Furthermore,  «  log-cost  RAM 
(unit-cost  RAM,  respectively)  is  charged  log(x+2)  (1,  respectively)  units 
of  space  for  each  noninput  memoxy  location  and  register  utilised  in  an 
accepting  computation,  where  x  is  the  largest  integer  stored  in  that 
memory  location  or  register. 

2.4  Probabilistic  Parallel  RAMs 

Our  principle  parallel  machine  model  is  the  Parallel  Random  Access 
Machine  (P-RAM) ,  similar  to  that  defined  in  (Fortune  and  Wyllie,  78]  and 
[wyllie,  79].  However,  we  allow  these  machines  probabilistic  choice. 
Initially,  given  an  input  string  w€  {0 , 1>* ,  a  probabilistic  P-RAM  consists 
of  a  single  probabilistic  RAM  initialized  as  defined  in  2.3,  with  an 
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additional  operation:  fork  which  allows  the  original  RAM  to  create  a  new 
"clone"  RAM  sharing  the  same  memory,  with  copies  of  the  original  RAM's 
registers  with  the  same  contents,  with  an  identical  finite  state  control, 
and  initialized  at  some  given  state.  Any  new  RAMs  may  also  create  new  RAMs 
by  the  fork  operations.  All  these  RAMs  operate  synchronously  with  the 
original  RAM.  Furthermore,  their  probabilistic  choices  are  assumed  to  be 
independent.  RAMs  are  allowed  to  simultaneously  read  the  same  memory 
location.  However,  if  two  distinct  RAMs  simultaneously  load  into  the  same 
memory  contents,  then  the  entire  computation  of  the  P-RAM  fai^s.  if  on  a 
particular  computation  sequence  the  original  RAM  enters  its  accept  state 
and  there  have  been  no  such  simultaneous  memory  load  conflicts  then  this 
computation  sequence  is  considered  to  be  accepting.  The  probabilistic 
P-RAM  accepts  an  input  string  U)€  {0,l}*  if  with  probability  >  1/2  a 
random  computation  sequence  is  accepting.  (See  2.2  for  definitions  of 
errors  of  acceptance  and  rejection.)  The  probabilistic  P-RAM  has  time 
bound  T(n)  {space  bound  S (n) ,  integer  bound  l(n),  processor  bound  P(n)) 
if  on  all  inputs  of  length  n  and  accepting  computation  sequences,  the 
machine  taken  <  T(n)  steps,  (uses  <  S  (n)  space,  operates  on  integers 
<  I  (n) ,  uses  <  P(n)  processors,  respectively).  Note  that  space  and  time 
are  charged  in  units  depending  on  whether  the  machine  is  unit-cost  or 
log-cost  as  defined  in  2.3. 
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SOME  FAST  PROBABILISTIC  PARALLEL  ALGORITHMS 


This  section  describes  some  time  efficient  algorithms  for  probabilistic 
P-RAMs  which  we  easily  derive  by  parallelizing  known  probabilistic 
sequential  algorithms.  (Section  4  gives  a  uniform  method  for  parallelizing 
any  probabilistic  sequential  algorithm.)  All  the  algorithms  described  here 
are  R -probabilistic:  with  rejection  error  <  1/2  (and  no  errors  of  acceptance) 
if  the  probabilistic  trials  are  made  twice. 

THEOREM  3.1.  There  are  unit-cost  R- probabilistic  P-rams  with  time  bound 
0 (log  n)  and  processor  bound  O(n3logn ), which  given  a  graph  G  with  n 
vertices , 

(a)  can  test  if  G  has  a  path  between  two  given  vertices,  and 

(b)  can  also  test  if  G  is  bipartite . 

Proof.  [Aleliuneas,  et  al.,  79]  give  for  these  problems  R-probabi- 
listic  sequential  algorithms  which  can  be  implemented  on  a  probabilistic 
RAM  in  0(1)  space  (using  integers  size  <  n  for  representing  edges)  and 
0(n3)  time.  Our  probabilistic  parallel  algorithms  are  derived  immediately 
by  applying  Theorem  4.1.  D 

Note  that  the  fastest  known  deterministic  P-RAM  algorithm  for  testing 

2  5 

connectivity  requires  O(log  n)  time  and  0(n  )  processors  [Csanky,  76]. 

theorem  3.2.  A  unit-cost  R- probabilistic  P-RAM  with  time  bound 
O(log  n)  and  processor  bound  0(n)  can  test  if  a  graph  of  n  vertices 
has  a  perfect  matching. 
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Proof.  Let  G*  (V ,E)  be  a  simple  graph  with  vertices  V={l,...,n). 
Lovasz,  80]  gives  a  probabilistic  sequential  algorithm  which  chooses  an 
N«n0^  and  constructs  a  symmetric  n  *  n  matrix  B  =  where  for  l^i,j<n 

(a)  is  a  random  element  of  {l,...,N}  if  i<  j  and  (i , j )  £E. 

(b)  =  if  i  >  j  and  (i,j)?E. 

(c)  B^  *  0  otherwise. 

If  the  determinant  of  B  is  not  0  then  G  has  a  perfect  matching. 

If  the  determinate  of  B  is  0,  then  for  N  sufficiently  large,  G 

has  a  perfect  matching  with  probability  <  1/2.  The  parallel  matrix 

inversion  algorithm  of  [Csanky,  76]  can  be  used  to  compute  the  determinant 
2  5 

in  time  O(log  n)  and  O  (n  )  on  a  P-RAM.  D 
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THEOREM  3.3.  A  unit-cost  ^-probabilistic  p-ram  with  0(  n  +  (log(nm))2) 
time  bound  and  o(n+m)  processor  bound  can  test  if  a  polynomial  f  (x) 
of  degree  m  has  a  root  in  gf  (pn) ,  where  p  is  a  fixed  prime. 

Proof.  We  parallelize  the  probabilistic  algorithm  of  [Rabin,  80] 

(which  generalized  and  proved  validity  for  a  previous  algorithm  of 

[Berlekamp,  70]  for  GF(p)).  (This  algorithm  can  be  implemented 

on  a  unit-cost  probabilistic  sequential  RAM  in  time  0(n2m)).  First, 

n  , 

compute  f1(x)  =GCD(f  (x)  ,yp  "  -1).  If  f  (x)  ■  1  then  f  (x)  has  no  roots 

over  GF (pn) .  Otherwise,  choose  a  random  <$  6  (o  ,1, . . .  ,pn-l}  and  compute 

fg(x)  =  GCD(f^  (x)  ,  (x+6)  1,//2).  Let  d  ,  d^  be  the  degrees  of  polynomials 

f^x),  fg  (x)  respectively.  If  0  <  d^  <  d1  then  f(x)  has  a  root  in 

GF(pn)  (in  this  case  f  (x)  has  factor  f (x)  if  2d  and  factor 

f^UJ/f^fx)  if  2d(j>d1),  and  otherwise  f  (x)  is  irreducible  in  GF(pn) 

with  probability  >  1/2.  The  required  polynomial  GCD  computations  can  be  done 

2 

by  a  unit-cost  P-RAM  0(log(nm))  time  and  0(n+m)  processors  by  using  the 
shuffle-exchange  network  of  [stone,  71]  to  compute  the  convolutions 
required  for  the  polynomial  GCD  algorithm  of  [Aho,  Hopcroft,  and  Ullman,  74]. 
The  exponentiations  can  be  computed  in  0(n)  parallel  time  by  repeated 
exponentiation.  □ 

(Note  that  the  fastest  known  deterministic  sequential  algorithms 
[Adleman,  80]  and  (Adleman  and  Odlyzko,  81]  for  testing 
if  a  polynomial  of  degree  n  has  a  root  over  GF (pn) 

require  time  0(log  n) d°9 (log  n) ) ) ^  These  algorithms  be  speed-up  by 
our  Theorem  5  to  0(log  n) 1,/2  lo9  (lo<3  (lo<3  >  > +1  parallel  time  on  a 

deterministic  P-RAM,  but  the  resulting  parallel  algorithms  remain  very 
s*ow  in  comparison  to  those  provided  by  Theorems  3.3. 
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THEOREM  3.4.  A  unit-cost  R- probabilistic  P-RAM  with  time  bound 

2 

O(log  n)  and  processor  bound  0<n  /log  n)  given  nxn  integer  matrices 
a,  B,  c  can  test  a*b^c. 

Proof.  Choose  a  random  column  vector  x£{-l,l}n  and  test 

A(Bx)  *Cx.  This  test  can  be  done  by  a  probabilistic  P-RAM  within  time 

2 

O(log  n)  and  processor  bound  0(n  /log  n)  by  forming  n/log  n  binary 
trees  of  processors,  each  of  size  2n  and  depth  O(log  n) ,  and  pipelining 
the  required  dot  products.  tFreivalds,  79]  shows  that  if  A»B^C  then 
Prob{A(Bx)  ■=  Cx}  >1/2.  D 

Note  that  the  naive  algorithm  for  testing  A*Bj<c  in  time  O(log  n) 

3 

on  a  deterministic  P-RAM  requires  at  least  n  /log  n  processors. 
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4.  SIMULATION  RESULTS  BETWEEN  PROBABILISTIC  RAMs  AND  PROBABILISTIC  P  -RAMa 

[Fortune  and  Wyllie,  78]  and  [Wyllie,  78]  characterise  the  computa¬ 
tional  complexity  of  their  deterministic  P-RAMs  in  terms  of  the  corqplexity 
of  deterministic  complexity  classes.  It  is  the  aim  of  this  section  to  do 
the  same  for  our  probabilistic  P-RAMs.  Our  simulation  methods  are  similar, 
except  for  the  use  of  probabilistic  choice  to  insure  the  probability  of 
errors  of  acceptance  and  rejection  are  preserved. 

4,1  Simulation  of  a  Probabilistic  RAM  by  a  Probabilistic  P-RAM 

THEOREM  4.1.  Let  Vi  be  a  probabilistic  RAM  with  oonstructible  time 

bound  T(n)>n,  space  bound  s (n)  >  log  n,  and  integer  bound  l(n).  Then 

there  is  a  probabilistic  P-RAM  M'  such  that  M«M’  (see  2.2  for  definition 

of  the  equivalence  relation  if  M  is  unit-cost  then  M'  has  unit-cost 

time  bound  o(S(n)log  i(n)  +  log  l’(n)),  and  processor  bound  0(i(n)  T(n)),- 

2 

if  M  is  log-cost  then  M’  has  log-cost  time  bound  0(s(n)+loc>  T(n) ) 

S  (n) 

and  processor  bound  0(4  T(n)). 

(Note:  Theorem  4.1  gives  a  speed-up  for  unit-cost  RAMs  only  if 
S(n)log  l(n)  <T(n);  Theorem  5.1  provides  a  uniform  quadratic  speed-up  even 
if  S (n)  -  T(n) . ) 

Proof.  Fix  sane  input  string  a >£Zn  and  let  I0(w)  be  the  initial 
configuration  of  M.  Let  &  be  the  set  of  configurations  of  M  with 
space  S(n).  Let  p  ■  |  9  |  (T(n)  4  1) .  Let  each  l€«?  and  each  t, 
0<t<T(n)  be  encoded  as  a  distinct  integer  i*<I,t>,  where  l<i<p. 

We  can  assume  that  the  encoding  and  its  decoding  are  computed  in  0(log  p) 
steps  on  a  P-RAM. 
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Our  simulating  probabilistic  P-RAM  M*  will  begin  by  a  series  of  fork 
operations  yielding  RAMs  M^, . . . ,M^.  Bach  RAM  ,  l<i<p,  has  a  local 
register  r^  and  an  associated  global  memory  location  NEXT^  which  is 
initialised  as  follows:  suppose  i«<I,t>  then  if  I  has  any  immediate 
successor  X',  let  randomly  choose  some  such  X*  and  load  <I',t+l> 
into  NEXT^  and  otherwise  if  X  has  no  successors  then  let  M^  load  i 
into  NEXTj, .  After  this  initialization,  each  ,  for  l<i<p, 
synchronously: 

(1)  loads  the  contents  of  NEXT^  into  register  r^  where  j  is  the 

contents  of  NEXT . ,  and 
1 

(2)  then  loads  NEXT^  with  the  contents  of  r^. 

This  is  repeated  Tiog  p“l  times.  We  can  assume  <l0(u>)  ,0>*  1  and  M1  is 
the  original  PAM  of  M'.  We  let  M^  enter  the  accepting  state  (so  M’ 
accepts)  if  NEXT^  ever  contains  integer  <I,t>  where  X  is  aui  accepting 
configuration  of  M. 

Xf  M'  accepts  on  a  particular  confutation,  then  there  must  be  a 
sequence  of  memory  locations  NEXT^  . . .  .NEXT^  t-l>  is  initializ®d 

to  <Xlfl>,...,<lt,t>  where  8111  accepting 

computation  sequence  of  M,  and  t  <  T  (n) .  Thus  the  memory  essentially  forms 
a  path  from  NEXT^  ^  to  NEXT^  ^  decreases  by  a  factor  of  1/2.  Thus 
after  T log  p"l  iterations,  NEXT^  ^  contains  <lt,t>. 

Suppose  Xq,I^,...  is  an  execution  sequence  of  M,  derived  from  a 
particular  sequence  of  probabilistic  choices  p.  Suppose  also  that  the  RAMs 


of  M'  make  a  sequence  of  probabilistic  choices  p’  such  that  M^  ^ 
initially  loads  NEXT^  ^  with  <lt+1,t+l>  for  t  -  0,1, . . .  ,T(n)  -  1. 


Then  M  errors  on  acceptance  (rejection,  respectively)  of  U)  when  making 
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probabilistic  choice*  p  iff  M*  errors  on  ecceptance  (rejection, 
respectively)  of  w  when  making  probabilistic  choices  p*.  Since  p  and 
p*  are  chosen  randomly,  it  follows  that  M *M' .  if  m  is  unit-cost 
|<#'|  *  I(n)S*n^;  so  if  N*  is  also  considered  to  be  unit-cost  the  time 
and  space  bound  is  O(log  p)  «0(S(n)log  I(n)  +  log  T(n))  and  the  processor 
bound  is  p-0(I(n)S(n)T(n)).  If  N  is  log-cost  |^|  <  22*S(n)  -  4S(n) , 

2 

so  if  M'  is  also  considered  to  be  log-cost  its  time  bound  is  O(log  p)  ■ 

0(S (n)  +  log  T(n) )  and  processor  bound  is  p«0(4  '  'T(n)).  D 

4.2  Simulation  of  a  Probabilistic  P-RAM  by  Probabilistic  RAM 

THEOREM  4.2.  Let  M  be  a  probabilistic!  p-ram  with  time  bound  T(n), 
space  bound  S  (n) ,  and  processor  bound  P  (n) .  Then  there  is  a  probabilistic 
ram  M'  with  space  bound  0(S(n)  +  P(n))  such  that  msm',  Furthermore t  if 
m  is  unit-cost  then  M'  has  unit-cost  time  bound  o(T(n)P(n))>  and  if  M 
is  log-coat  then  m*  has  log-cost  time  bound  0(T(n)  ?(n)log  P(n)). 

Proof.  The  simulating  probabilistic  RAM  will  have  only  5  registers; 
the  first  register  of  M'  will  store  an  integer  p  giving  the  total 
number  of  RAMs  currently  being  executed,  and  the  second  register  of  N1 
will  store  an  integer  designating  the  RAM  currently  being  simulated;  the 
other  3  registers  of  M*  will  be  used  for  arithmetic  operations  and 
indirect  addressing  of  memory  locations.  Suppose  each  RAM  of  M  has  r 
registers.  The  registers  of  the  simulated  RAMs  of  M  will  be  stored  in  a 
special  block  of  memory  locations,  which  is  increased  by  r+1  on  every 
fork  operation.  The  simulation  of  M'  by  M  is  straightforward;  on  each 
move  of  M,  M*  must  simulate  a  move  by  each  of  the  currently  active  RAMs 
of  N.  This  requires  0(P(n))  steps  if  M*  is  unit-cost,  and  0(P(n)log  P(n)) 


•taps  if  M'  ia  log-coat.  By  storing  two  copies  of  the  memory  of  M,  it 
is  easy  to  detect  simultaneous  load  conflicts.  M*  is  allowed  to  enter  its 
accepting  state  just  when  the  original  RAM  of  M  enters  its  accepting  stace 
and  there  are  no  simultaneous  load  conflicts.  Since  the  probabilistic 
choices  taken  by  the  individual  probabilistic  RAMa  are  assumed  to  be  inde¬ 
pendent.  and  the  simulating  probabilistic  RAM  M*  takes  independent 
probabilistic  choices,  the  probability  of  errors  of  acceptance  and  rejection 
of  M  and  M'  are  identical,  so  M*M’.  D 
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5.  PARALLEL  SPEED-UP  OP  PROBABILISTIC  RAMs 

THEORB4  5.1.  Let  M  be  a  probabilistic  RAN  wit h  oonetruotibl e  tim 
bound  T(n)  >n  aid  integer  boun'  i(n).  Than  there  is  a  probabilistic 
p-ram  M'  euoh  that  nwn'  and  if  M  is  vnit-oott  than  M'  has  unit-cost 
time  bound  0(T(n)  (log  TtnJJlogOTMKn)))1^2  and  if  N  is  log-oost  then 
M'  has  log-oost  time  bound  0(T(n)1//f2log  T(n)>. 

Proof.  Lat  u>€  {0,l}*  be  on  input  string  of  length  n. 

There  is  t  constant  c  >  1  such  that  N  has  at  most  c  choice®  for 
next  moves  at  each  step.  Thus  the  choices  can  be  represented  by  a 
sequence  ^ a  * ’ * ,PT(n)-l  wher*  Pt €  {l#. . . ,c).  The  parallel  simulation 
of  N  by  N*  begins  by  probabilistically  choosing  pQ, • • • ,pT(n)-l  in 
O(log  T(n) )  parallel  time,  ,«nd  storing  these  choices  in  distinct  memory 
locations . 

The  fundamental  idea  (previously  used  in  [Hopcroft,  Paul,  and  Valiant, 
75]  and  [Dymond,  BO]  for  speed-up  of  deterministic  Turing  machines)  is  to 
partition  the  T(n)  st?ps  into  consecutive  intervals  of  length  L, 
l<L<T(n)  to  be  determined  below. 

Let  q  be  the  number  of  states  in  the  finite  controle  of  N  .  Suppose 
in  the  following  that  M  is  unit-cost.  Then  H  can  read  from  and  load 
into  at  most  3L  registers  and  memory  locations  within  a  time  interval  A. 
of  length  L.  Furthermore,  we  can  encode  by  a  positive  integer 
<  r  ■  q(T(n)l(n) ) ^  the  current  state  and  the  contents  and  addresses  of  the 
registers  and  memory  locutions  read  from  (or  loaded  into)  during  A. 

(If  N  is  log-cost,  N  can  read  from  and  load  into  at  most  3L  bite  of 
registers  and  memory  locations  with  a  time  interval  A  of  length  L.  Thus 
we  can  encode  this  by  a  positive  integer  <r,  where  r  ■  q(T(n)4)^  in  the 
case  N  is  log-cost.) 
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bet  H  -  rT(n)/L*)  -  2.  For  each  t  ■  0,L,2L, . , .  ,HL  the  simulating  H* 
constructs  in  global  memory  a  tab la  PREDICT,  which  given  a  poaitiva 
integer  i<r  encoding  a  poaaible  atate  of  M  and  contenta  and  addresses 
of  all  regiatera  and  memory  locations  to  be  read  during  time  interval 
{t,t+l,...,t+L-l}  PREDICT^ (i)  is  a  positive  integer  <  r  encoding 
the  contents  and  addresses  of  all  registers  and  memory  locations  to  L 
loaded  into  during  using  the  predetermined  choice  sequence 
pt ' ?t+l ' • • ’ ' Pt+L-1 '  How#v*r»  PREElCTt(i)  ■  0  if  this  choice  sequence 

requires  reading  a  register  or  memory  location  whose  contents  are  not  defined 
by  i,  or  if  the  contents  of  a  register  or  memory  location  are  provided  by 
i  but  are  not  read  from.  These  tables  can  be  constructed  in  parallel  by 
N'  in  time  0(L  +  log  r) . 

T(n)  distinguished  global  memory  locations  of  M'  are  used  to  store 
the  contents  of  the  memory  of  N.  Also,  a  special  register  is  used  to  store 
the  state  of  the  finite  control  of  M.  These  are  initialised  as  in  the 
initial  configuration  of  M.  The  simulation  of  N  by  M'  will  then 
proceed  sequentially  in  H  phases,  each  corresponding  to  a  time  interval 
for  t  “  0,L,2L,  «  .  m  ,HL« 

Suppose  at  the  start  of  the  phase  corresponding  to  interval  At,  M' 
is  currently  storing  (as  described  above)  the  configuration  Xt  of  N, 
where  I0,X^,...,Xt  is  the  sequence  of  configurations  of  M  induced 
from  1^  ■  Xq  (w)  by  the  choice  sequence  p0»p^,  •  •  •  'Pfl  c*lo**n  by  M'  at 
the  start  of  the  simulation.  Then  there  is  a  unique  sequence  of  configura¬ 
tions  lt*Xt+i' *  *  * '*t+L  induc*d  Py  the  predetermined  choice  sequence 
Pa-p.  p..T  Hence  there  is  a  unique  i.,  l<i<r,  such  that 

I  tTl  vti*"A  w 

PREDICT^ (i^)  ft  o  and  it  encodes  contents  of  registers  and  memory  locations 


r 
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eonsistent  with  PREDXCT^U^)  is  encoded  end  is  used  to  update  the 

memory  of  N*  to  store  the  configuration  I  .  After  the  phase 

associated  with  time  interval  N*  simulates  N  step  fay  step  sequentially  for 

t  ■  (H+1)L,  (H+Dl+l, .. .  ,T(n) .  Let  the  original  RAN  of  N'  enter  the 

accepting  state  if  the  simulated  N  does.  Since  the  choice  sequence 

pQ» . . . *PT(n)_i  chosen  randomly  by  M' ,  it  induces  a  random  computation 

sequence  of  N  from  IQ(w),  so  NMN'. 

Xn  the  case  N  is  unit-cost,  we  let  N1  be  unit-cost.  The  unit- 
cost  time  for  initialisation  and  computation  of  the  PREDICT  tables  is 
0 (L+log  r)  ■  0(L  log(T(n)I(n))) ,  The  unit-cost  time  for  each  phase  is 
O(loglog  r)  ■  0(log(L  log (T(n)l (n) ) ) )  since  encoding  and  decoding  of 
elements  of  the  PREDICT  tables  is  done  in  prallel.  There  are  <  T(n) A 
phases.  Thus  the  total  unit-cost  time  is 

0(L  log(T(n)I(n) ))  +  (T(n)  A)0(log(L  log (T (n) I (n) ) ) )  +L 

-  0 (T (n) (log  T(n))log(T(n)I(n)))1/2, 
for 

L  -  (T(n)  (log  T(n))/log(T(n)I(n)))l/2. 

In  the  case  H  is  log-cost,  we  similarly  let  N'  be  log-cost.  To 
allow  for  Odoglog  r)  parallel  log-cost  time  access  of  the  PREDICT  tables, 
the  log  r  bits  of  each  element  of  a  PREDICT  table  must  be  stored  in 
dietxct  contiguous  memory  locations,  instead  of  a  single  memory  location. 

The  log-cost  time  for  initialisation  and  computation  of  the  PREDICT  tables  is 
0 (L+log  r)  +  0(L  log  T(n)).  The  log-cost  time  for  each  phaae  is 
Odoglog  r)  ■  0(log(L  log  T(n)))).  Thus  the  total  log-cost  time  is 

0(L  log(T(n)))  +  (T(n)AUog(L  log  T(n)))  +  L  ■  0(T(n)1/2log  T(n>) 
for 

L  -  Tfo)1^2  .  o 
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6.  ELIMINATION  OF  PROBABILISTIC  CHOICE  IN  PARALLEL  CONFUTATIONS 

Let  N  be  a  (uniform)  probabilistic  P-RAM  with  tin*  bound  T(n)  end 
processor  bound  P (n) .  Lot  S (n)  bo  the  maximum  number  of  probabilistic 
ohoicos  mode  by  oil  tho  RAMs  of  N  on  any  input  of  length  n.  (Note  that 
E(n)  <T(n)P(n> ) .  Lot  eA(n)  ,eR(n)  bo  tho  eccoptonoe  and  rejection  error 
functions  for  N,  and  lot  e(n)  »max(eA(n)  ,CR(n)) .  Also,  lot 
X(n)  »  (l+2n)/logJl/(4e(n)  (l-e(n)) )) .  No  assume  e(n)<l/2  so  X(n)  is  finite. 

Tho  following  theorem  states  that  wo  oan  eliminate  tho  probabilistic 
choice  in  N  by  introducing  nonuniformity  with  adoio e  bound  A(n)t  i.e., 
wo  allow  tho  nonuniform  P-RAM  to  have  in  tho  initial  configuration  for  each 
input  length  n>0,  a  distinguished  sequence  of  A(n)  memory  locations 
each  initialised  to  either  0  or  1  and  fixed  for  all  inputs  of  length  n. 

theorem  6.  For  any  T(n),  l<T(n)<X(n),  thmrs  is  a  dstsministio 
nonuniform  P-RAM  M  which  accepts  l(M)  with  time  bound  o(T(n)t(n)  + 
log(X(n)/x (n) ) ) ,  proososor  bound  0(P(n)X(n)/T(n) ) ,  and  advio*  bound 
0(X(n)2(n)) . 

Note i  Thus  to  eliminate  probabilistic  choice  we  have  a  trade-off 
between  an  increase  in  time  bounds  and  an  increase  in  processor  bounds. 

However,  if  e(n)  decreases  exponentially,  then  neither  the  time  bound  nor 
the  processor  bound  are  asymptotically  increased. 

Theorem  6  will  be  proved  as  follows »  first  we  show  that  we  can 
eliminate  probabilistic  choice  from  N  if  c(n)  is  sufficiently  smalli 
then  we  show  how  to  make  e(n)  sufficiently  small. 

P  (a) 

We  can  aarume  a  constant  e>l  such  that  M  has  <  c  choices  of 
moves  next  from  any  configuration.  Fix  same  input  length  n>0.  A  parallel 
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oMm  ••fUiRM  P  is  of  the  form  ,,PT(n)-l  *h#r*  pi  *  *1*’* 

for  i« 0,1, ... ,T(n)-l.  Lot  bo  oil  choice  sequences  of  length  T(n). 

Given  on  input  w€  {0,l)n,  o  choieo  ooquonco  in  ST(n)  induce*  o  computation 
ooquonco  of  M.  Lot  (n)  <w)  “  < P  €  **  (n>  I  <“  c  L  t*0  on  eccepting 

computation  ooquonco  on  input  w  ond  choieo  ooquonco  p>  or  Iwf  UH>  ond 
N  hot  o  nonoccopting  computation  ooquonco  on  input  w  ond  choieo  ooquonco 

P)>. 

LBNNh  6.1.  If  c(n)  <2“n,  then  there  is  a  deterministic  nonmiform 
p-ram  n  which  accepts  L(N)  with  tins  bound  0(T(n)),  processor  bound 
p(n)  and  adoioe  bound  0(2(n)). 

Proof.  it  tuff icon  to  ohow  (*) « 

<*)  if  E  (n)  <  2~n  thon  thoro  exists  some  choieo  ooquonco  P*€**(n)  ,uch 
thot  for  oil  w€  {0,l)n,  P*€RT(ft)(w). 

Our  proof  io  by  controdiction  (ond  thus  io  not  conotructivo) .  For 
ooch  P ^ ®x(n)  Ut  ftp)  •  K“»€  {O.l^lpC^y^  ttt)>|  ond  lot  *■  !**(„,)  I  • 
Suppoao  (*)  doos  not  hold*  so  2  *f(P)  for  oil  P^^x(n)* 

2n  >  ~  L  ftp) 

^(n) 

»  ~  (r/e(n)) 

■  l/£(n) 

>  2n,  o  controdiction  0 

LEMtA  6.2.  For  any  i(n>,  l<T(n)<X(n),  there  is  a  probabilistic 
p-NUH  K'  which  accepts  l(n)  with  acceptance  and  rejection  errors  e^tn) , 

C(n)  tftoro  «ox(c;tn).e;(n))  <  2’n,  and  time  bound  0(T(n)T(n)  ♦  log(Mn)  A<«>>) 
and  processor  bound  o(T(n)X(n)/t (n) ) . 
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Proof .  Let  u>£  {0,1  }n  be  the  input  string,  for  some  n^O.  Our 
probabilistic  P-RAM  M'  will  simulate  M  on  input  u)  a  total  of  A  (n) 
times;  these  simulations  will  be  done  by  rX(n)/T(n)"l  groups  of  P(n) 
probabilistic  RAMs,  with  each  group  simulating  M  T(n)  times.  M'  is 
allowed  to  enter  an  accepting  configuration  only  if  M  enters  an 
accepting  configuration  on  at  least  X(n)/2  of  the  A(n)  trials.  (This 
technique  of  determining  the  corsensus  of  a  series  of  trials  is  due  to 
(Bennett  and  Gill,  81].)  The  count  of  successful  trials  can  be  computed 
in  log (X (n) /T (n) )  parallel  time.  The  acceptance  error  of  M'  is 


ei<n) 


£ 

i=A (n)/2 


rn 


U n)1(l  -  e(n) )' 


<  (4c  (n)  (1-e  (n) ) )  ^  by  bounds  of  [Chernoff,  52)  also  given 

in  [Feller,  57) 


<  2  n  for  given  A(n)  > 2n/log (1/ (4e (n) (1-C (n) ) ) ) . 


Also  we  can  similarly  show  the  error  of  rejection  £' (n)  <  2  n.  Hence 

R 

max(E' (n) ,e’ (n) )  <  2  n  as  claimed.  o 

A  R 

Theorem  6  follows  immediately  by  applying  to  Lemma  1  the  probabilistic 
P-RAM  M*  derived  by  Lemma  6.2. 

By  applying  Theorem  6  to  Theorems  3.1-3,  we  have: 

corollary  6.1.  There  exists  unit-cost  nonuniform  deterministic  P-RAMs 
with  time  bound  O(log  n) ,  processor  and  advice  bound  0(n4log  n) ,  which 
given  a  graph  G  with  n  vertices ,  can  test  (a)  whether  G  has  a  path 
between  two  given  vertices  and  can  also  test  (b)  whether  G  is  not 
bipartite. 
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COROLLARY  6.2.  There  exists  a  unit-cost  nonuniform  deterministic 
P-RAM  with  time  hound  O(log  n)2,  processor  and  advice  bound  n0^  which 
can  test  if  a  graph  of  n  vertices  has  a  perfect  matching. 

corollary  6.3.  There  exists  unit-cost  nonuniform  deter¬ 

ministic  p-rams  with  time  bound  o(n)  ,  processor  and  advice  bound 
2 

0(n  )  which  can  test: 

given  a  polynomial  of  degree  o(n),  does  it  have  a  root  in 
GF (pn ) ? 
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7.  CONCLUSION 

This  paper  has  primarily  considered  the  power  of  probabilistic  choice 
for  parallel  RAMs.  Theorems  3.2-5  also  hold  for  fixed  connection  parallel 
networks  with  probabilistic  processors.  Theorems  4.1  and  4.2  can  be 
extended  to  similar  simulation  results  for  other  probabilistic  parallel 
machines,  such  as  the  hardware  modification  machines  (HMMs)  of  [Cook,  80] 
augmented  with  probabilistic  choice  (see  [Reif,  81]).  Also  Theorem  4 
generalizes  to  other  probabilistic  parallel  machines  such  as  HMMs  and 
circuits  with  probabilistic  choice. 
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